The potential of the transformer-based survival analysis model, SurvTrace, for predicting recurrent cardiovascular events and stratifying high-risk patients with ischemic heart disease

Introduction Ischemic heart disease is a leading cause of death worldwide, and its importance is increasing with the aging population. The aim of this study was to evaluate the accuracy of SurvTrace, a survival analysis model using the Transformer—a state-of-the-art deep learning method—for predicting recurrent cardiovascular events and stratifying high-risk patients. The model’s performance was compared to that of a conventional scoring system utilizing real-world data from cardiovascular patients. Methods This study consecutively enrolled patients who underwent percutaneous coronary intervention (PCI) at the Department of Cardiovascular Medicine, University of Tokyo Hospital, between 2005 and 2019. Each patient’s initial PCI at our hospital was designated as the index procedure, and a composite of major adverse cardiovascular events (MACE) was monitored for up to two years post-index event. Data regarding patient background, clinical presentation, medical history, medications, and perioperative complications were collected to predict MACE. The performance of two models—a conventional scoring system proposed by Wilson et al. and the Transformer-based model SurvTrace—was evaluated using Harrell’s c-index, Kaplan–Meier curves, and log-rank tests. Results A total of 3938 cases were included in the study, with 394 used as the test dataset and the remaining 3544 used for model training. SurvTrace exhibited a mean c-index of 0.72 (95% confidence intervals (CI): 0.69–0.76), which indicated higher prognostic accuracy compared with the conventional scoring system’s 0.64 (95% CI: 0.64–0.64). Moreover, SurvTrace demonstrated superior risk stratification ability, effectively distinguishing between the high-risk group and other risk categories in terms of event occurrence. In contrast, the conventional system only showed a significant difference between the low-risk and high-risk groups. Conclusion This study based on real-world cardiovascular patient data underscores the potential of the Transformer-based survival analysis model, SurvTrace, for predicting recurrent cardiovascular events and stratifying high-risk patients.


Introduction
Ischemic heart disease remains the leading cause of death worldwide, despite advancements in treatment modalities and therapeutic technologies [1,2].As the population continues to age, improving the prognosis and treatment of ischemic heart disease has become increasingly important.Accurate patient risk stratification is crucial for optimizing treatment, and the effectiveness of scoring systems, such as the Suita score, has been well-documented [3].Wilson et al. have also reported that scoring models incorporating age and history of catheterization are effective in predicting post-catheterization events [4].
In recent years, rapid advancements in machine learning have shown promise in surpassing conventional methods in patient risk assessment [5,6].Beyond standard machine learning survival analysis, new deep learning survival models have been proposed [7].Specifically, Wang et al. found that a deep learning model known as the "Transformer", which employs an attention mechanism rather than recurrent neural networks or convolutional neural networks, is effective for survival time analysis [8,9].The Transformer model has become pivotal in contemporary deep learning, serving as the foundation for systems like ChatGPT [10,11].However, no studies have yet assessed the effectiveness of using the Transformer for survival analysis in the cardiovascular field.Therefore, the aim of this study was to compare and validate the accuracy of the novel Transformer-based model against conventional risk scoring model using real-world data from cardiovascular patients.

Study design and participants
This study involved consecutive enrollment of patients who underwent percutaneous coronary intervention (PCI) at the Department of Cardiovascular Medicine, University of Tokyo Hospital, between 2005 and 2019.Within this timeframe, the initial PCI performed at our hospital was designated as the index procedure for each individual patient and used for analysis.Data were accessed and collected for research purposes from October 20, 2022 to December 28, 2022.Information that could identify individual participants was anonymized.A correspondence table was created to ensure that patient information could be accessed after collection, if necessary, while maintaining anonymity.The outcomes of these procedures were evaluated retrospectively.Data on patient background, clinical presentation, medical history, admission medications, perioperative complications, and discharge medications were extracted from the electronic health records (EHRs) of those who underwent the index PCI.Hypertension was defined as a systolic blood pressure of 140 mmHg or higher upon admission, a diastolic blood pressure of 90 mmHg or higher upon admission, or ongoing treatment with antihypertensive medications.Diabetes mellitus was defined by a hemoglobin A1c level �6.5% upon admission or ongoing treatment with either insulin or oral hypoglycemic agents.Dyslipidemia was defined as a low-density lipoprotein cholesterol level �140 mg/dL upon admission, a highdensity lipoprotein cholesterol < 40 mg/dL upon admission, triglycerides �150 mg/dL upon admission, or ongoing use of dyslipidemia medications.Chronic kidney disease was defined as patients with an eGFR <60 mL/minute/1.73m 2 , calculated using the Modification of Diet in Renal Disease (MDRD) equation [12] and serum creatinine levels upon admission modified by Japanese coefficients.
Missing data constituted 1.0% of all variables in the total dataset.These missing values were addressed using the multiple imputation method [13].This technique substituted missing data points with a set of plausible alternatives, thereby generating multiple complete datasets for analysis.Each dataset was individually analyzed, and the results were then aggregated to produce a single, comprehensive result.In this study, we used Python to generate five pseudocomplete datasets, applying multiple imputations using the Bayesian Ridge method (S1 File).
To improve model interpretability and minimize multicollinearity, Pearson's correlation coefficient was used to assess the correlation among explanatory variables.Any variable exhibiting a Pearson's correlation coefficient exceeding 0.90 was omitted from the set of explanatory variables used for model training [14].In cases where two features were highly correlated, the one with the greater overall correlation to all features was eliminated [14].During the preprocessing phase, all continuous variables were standardized to have a mean value of 0 and a standard deviation of 1.
The endpoint consisted of a composite of major adverse cardiovascular events (MACE), including cardiac death, acute coronary syndrome, cerebrovascular event, and hospitalization for heart failure [4].EHRs were used to collect data on these outcomes, as well as the period until their occurrence, for up to two years following the index procedure.Cardiac death was defined as death from acute myocardial infarction, ventricular arrhythmia, or heart failure [15].Acute coronary syndrome was defined as nonfatal myocardial infarction or unstable angina [15].Nonfatal myocardial infarction was defined as persistent angina accompanied by new ECG abnormalities and elevated cardiac biomarkers [15].Unstable angina pectoris was defined as an extended episode of resting ischemic symptoms (typically exceeding 10 minutes) or a lowering of the activity threshold that induced accelerated chest pain, necessitating an unscheduled medical visit and an overnight stay-usually within 24 hours of the most recent symptoms-while not fulfilling myocardial infarction cardiac biomarker criteria [16].Cerebrovascular events were defined as either cerebral hemorrhage or cerebral infarction.Survival time analyses were conducted on these outcomes until the respective dates of event onset.To compare the prognostic accuracy of the novel Transformer-based model with that of the conventional risk scoring model, the c-index was employed [17].Subsequently, the risk stratification capabilities of each model were assessed by computing risk scores for every patient using the trained models.Patients in the test set were classified into high-, intermediate-, and lowrisk score groups [18] and evaluated through Kaplan-Meier survival curves [19] and log-rank tests [20].
The impact of explanatory variables on outcomes was assessed using Shapley additive explanations (SHAP) [21].An algorithmic evaluation method rooted in game theory, SHAP uses Shapley scores to estimate the contribution of each explanatory variable to the model's prediction.
To assess the robustness of our findings, we performed three distinct sensitivity analyses: first, by omitting missing values; second, by adjusting the percentage of test sets; and third, by excluding patients with a history of PCI.This study was conducted in accordance with the revised Declaration of Helsinki and received approval from the institutional review board of the University of Tokyo Hospital (2021238NI-( 2)).Informed consent was obtained in the form of an opt-out on a website.

Modeling
To evaluate the predictive accuracy of MACE, we utilized the scoring system proposed by Wilson et al. [4] and SurvTrace, which is based on a model that uses a Transformer architecture [8].
The scoring system proposed by Wilson et al. serves as a predictive model for recurrent cardiovascular disease and incorporates variables such as age, smoking history, history of diabetes or heart failure, body mass index, number of diseased vessels, and history of statin or aspirin therapy.For the purposes of this study, it was defined as a conventional scoring model.Surv-Trace is an alternative survival time analysis model that employs a Transformer, a specific deep learning technique.Using an attention mechanism, this model enables efficient calculation of the effect of each variable on survival time.All computational models were implemented using Python and executed on an Nvidia Tesla A-100 80GB graphics processing unit.
For data partitioning, 90% of the total dataset was randomly selected to constitute the training set.Subsequently, 25% of this training set was randomly allocated for validation during the model training process.The remaining 10% of the data, which was not included in the training set, served as a test set for assessing the accuracy of the trained models.Throughout the training process, Optuna, an advanced framework for hyperparameter optimization tailored for machine learning, was employed to fine-tune the model's hyperparameters [22].S2 File shows the SurvTrace execution code used.

Statistical analysis
Five pseudo-complete datasets were generated through the application of multiple imputation techniques to address missing values.The model's accuracy was then calculated based on these five datasets.To synthesize the findings, the five accuracy estimates derived from each model were integrated using Rubin's rules, facilitating a comparison of model performance [23].
For continuous variables, measurements were expressed as either mean (± standard deviation) or median (first and third quartiles), while categorical variables were reported as counts and frequencies (%).
The models' prognostic accuracy was assessed using Harrell's c-index [18].Additionally, the risk stratification capabilities of each model were assessed through Kaplan-Meier curves [20] and log-rank tests [21].The p value threshold for significance was set at <0.05.All statistical analyses were performed using Python 3.7.

Results
Between January 1, 2005, and December 31, 2019, a total of 3938 first-time PCIs were performed in our hospital.Of these, 394 were designated as the test dataset, while the remaining 3544 cases were used for model training (Fig 1).Among the patient information data collected from the EHRs at the University of Tokyo Hospital, 171 explanatory variables were used.Table 1 outlines the baseline characteristics of the key explanatory variables.The training dataset contained a significantly higher number of patients with a history of previous PCI compared with the test dataset.During the observation period, 683 subjects (17.3%) were lost to follow-up, including 610 cases in the training dataset and 73 cases in the test dataset.
The c-index of SurvTrace outperformed that of the conventional scoring system, registering a mean c-index of 0.72 (95% confidence interval: 0.69-0.76),as opposed to a mean c-index of 0.64 (95% confidence interval: 0.64-0.64)for the conventional scoring system (Table 2,  In the first sensitivity analysis, cases with missing values were excluded from both training and test datasets.Post-exclusion, the training dataset comprised 2137 cases, and the test dataset contained 254 cases.The c-index for SurvTrace was 0.71, compared with 0.66 for the conventional scoring system.The second sensitivity analysis involved adjusting the proportion of the test dataset to 20%.Following this modification, the analysis was performed using one of the five pseudo-complete datasets generated by the multiple imputation method, including both training and test datasets.This adjustment yielded a c-index of 0.68 for SurvTrace and 0.66 for the conventional scoring system.In the final sensitivity analysis, after excluding patients with a history of PCI from one of the five pseudo-complete training and test datasets, the c-index for SurvTrace was 0.69, compared with 0.63 for the conventional scoring system.
This figure illustrates the flowchart of the study.Initially, all data were split into training and test datasets at a 9:1 ratio.To address missing values, multiple imputation was applied to both datasets, generating five pseudo-complete datasets for each.A separate 25% segment of the training dataset was reserved for validation.Subsequently, survival analysis was performed on each pseudo-complete dataset, and the c-index was calculated.Finally, Rubin's rules were

Discussion
This study demonstrated that SurvTrace, a predictive model using the Transformer deep learning algorithm, was effective in predicting recurrent cardiovascular events in patients with ischemic heart disease based on real-world clinical data.Compared with conventional scoring system, SurvTrace not only demonstrated superior accuracy in event prediction but also showed an improved ability to stratify high-risk patients.
The Transformer-based SurvTrace model demonstrated significantly higher prediction accuracy for recurrent cardiovascular events in patients with ischemic heart disease, using real-world clinical data, than did conventional scoring system.SurvTrace also demonstrated a significantly greater capacity for high-risk patient stratification relative to conventional scoring system.The model maintained its superior performance across a range of sensitivity analyses, which included the exclusion of missing values from the training and test datasets, modification of the test set percentages, and the exclusion of patients with a history of PCI.These results are consistent with previous studies that have underscored the superiority of machine learning and deep learning algorithms over conventional scoring systems [6,18].The high accuracy of these advanced models is likely attributed to their ability to identify complex patterns among explanatory variables, a feature not present in conventional methods.Typically, conventional scoring systems rely on linear models, selecting only statistically significant explanatory variables.Such models necessitate explicit definitions of relationships between explanatory variables to account for any interactions, thereby increasing model complexity and raising concerns about multicollinearity and overfitting as the number of variables grows.In contrast, the Transformer algorithm can directly incorporate multiple explanatory variables into its models, capturing nonlinear relationships and complex interactions among them without the need for explicit definitions.In this study, while the conventional scoring system incorporated only important variables such as age, gender, and medical history, SurvTrace used all 171 explanatory variables.This comprehensive approach to feature inclusion may contribute to its higher predictive accuracy.
The Transformer model's ability to stratify high-risk group more accurately than conventional scoring system has important implications for managing patients with ischemic heart disease in real-world clinical practice.Moreover, the alignment of our SHAP results with prior findings further underscores the robustness and validity of our study's outcomes.The enhanced risk stratification capabilities of the Transformer model could potentially improve clinical decision making and assist physicians in tailoring treatment plans for individual patients [24].Recent advancements have introduced large language models capable of automatically extracting structured data from electronic medical records [25,26].Using these language models enables automated survival time analysis and future risk stratification based on individual patient records, offering a more personalized treatment approach that may potentially enhance intervention effectiveness and improve patient outcomes.
This study has several limitations that warrant consideration.First, our research relied on a dataset from a single institution, making it susceptible to potential selection bias.Future studies should address institution-specific biases by expanding and validating the diversity of the patient population through multicenter studies.Second, the sample size was relatively modest, comprising 3938 patients.In general, deep learning models require larger datasets to achieve high levels of accuracy; therefore, our sample size may have been insufficient.Third, although this study demonstrated the superiority of the Transformer model over conventional scoring system, it should be noted that the model used was specific to this study.Other Transformer models not evaluated in this study may yield different results.Fourth, this study was retrospective in nature, with events meticulously tracked in the EHRs.Despite this thorough tracking, some events might have been overlooked as a result of patients relocating or transferring to other hospitals, potentially leading to selection bias.To mitigate this issue, future prospective studies employing survival analysis with the Transformer model are necessary.Lastly, missing values in the dataset were handled using multiple imputation methods to facilitate the Transformer model's application.These imputed values could introduce bias, especially for the Transformer model, as deep learning models are known to be sensitive to data noise.

Conclusion
This study demonstrated that a survival analysis model using Transformer, a state-of-the-art deep learning method, was significantly more accurate than the conventional scoring system in predicting recurrent cardiovascular events and stratifying high-risk patients using realworld clinical data.Additional research is warranted to further optimize the performance of deep learning models for more effective risk stratification and management of patients with ischemic heart disease.
Fig 2).Fig 3 illustrates the learning curve of SurvTrace during its training process.The most accurate training model from among all trained risk prediction models, along with its dataset, was used to evaluate risk stratification capabilities.While the conventional scoring system showed that the low-risk group experienced significantly fewer events compared with the high-risk group, it did not show a significant difference between the intermediate-risk group and the other patient groups (Fig 4).In contrast, SurvTrace revealed that the high-risk group had a significantly higher number of events than the other groups (Fig 4).Fig 5 presents the SHAP result, indicating that SurvTrace highlighted the influence of preexisting conditions, such as a history of chronic heart failure.

Fig 2 .Fig 3 .
Fig 2. C-indices of the models.This figure shows the c-index for both the conventional scoring system and SurvTrace.The upper and lower black lines represent the upper and lower limits of the 95% confidence intervals, respectively.The orange line shows the mean c-index value calculated from five pseudo-complete datasets.https://doi.org/10.1371/journal.pone.0304423.g002

Fig 4 .Fig 5 .
Fig 4. Kaplan-Meier curves of the models.This figure shows the Kaplan-Meier curves generated by both the conventional scoring model and SurvTrace.The blue lines represent the Kaplan-Meier curve for the low-risk group as stratified by risk scores from both models.Similarly, the orange and green lines represent the curves for the intermediate-and high-risk groups, respectively.The translucent segments of each line indicate the 95% confidence interval.https://doi.org/10.1371/journal.pone.0304423.g004

Table 1 . Baseline characteristics of key explanatory variables.
Values are shown as n (%), mean ± standard deviation, or median (first and third quartiles).